AITopics

Country:

Oceania > New Zealand (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Regional Government (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)

Neural Information Processing SystemsDec-25-2025, 06:33:34 GMT

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlapping time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language models perform worse in the realistic setup of predicting future utterances from beyond their training period, and that model performance becomes increasingly worse with time. We find that, while increasing model size alone--a key driver behind recent progress--does not solve this problem, having models that continually update their knowledge with new information can indeed mitigate this performance degradation over time. Hence, given the compilation of ever-larger language modelling datasets, combined with the growing list of language-model-based NLP applications that require up-to-date factual knowledge about the world, we argue that now is the right time to rethink the static way in which we currently train and evaluate our language models, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world. We publicly release our dynamic, streaming language modelling benchmarks for WMT and arXiv to facilitate language model evaluation that takes temporal dynamics into account.

name change, neural language model, temporal generalization, (4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Madaan, Divyam, Chopra, Sumit, Cho, Kyunghyun

Temporal Generalization: A Reality Check

arXiv.org Artificial IntelligenceSep-30-2025

Machine learning (ML) models often struggle to maintain performance under distribution shifts, leading to inaccurate predictions on unseen future data. In this work, we investigate whether and under what conditions models can achieve such a generalization when relying solely on past data. We explore two primary approaches: convex combinations of past model parameters (\emph{parameter interpolation}) and explicit extrapolation beyond the convex hull of past parameters (\emph{parameter extrapolation}). We benchmark several methods within these categories on a diverse set of temporal tasks, including language modeling, news summarization, news tag prediction, academic paper categorization, satellite image-based land use classification over time, and historical yearbook photo gender prediction. Our empirical findings show that none of the evaluated methods consistently outperforms the simple baseline of using the latest available model parameters in all scenarios. In the absence of access to future data or robust assumptions about the underlying data-generating process, these results underscore the inherent difficulties of generalizing and extrapolating to future data and warrant caution when evaluating claims of such generalization.

artificial intelligence, machine learning, natural language, (18 more...)

2509.23487

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsAug-18-2025, 22:02:42 GMT

f5bf0ba0a17ef18f9607774722f5698c-Paper.pdf

large language model, machine learning, natural language, (20 more...)

Country:

Oceania > New Zealand (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government > Regional Government (0.67)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Communications (0.93)

Neural Information Processing SystemsJan-19-2025, 14:06:21 GMT

Mind the Gap: Assessing Temporal Generalization in Neural Language Models

neural language model, recent progress, temporal generalization, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

arXiv.org Artificial IntelligenceJul-10-2024

Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization

Zhu, Chenghao, Chen, Nuo, Gao, Yufei, Zhang, Yunyi, Tiwari, Prayag, Wang, Benyou

The rapid advancement of Large Language Models (LLMs) highlights the urgent need for evolving evaluation methodologies that keep pace with improvements in language comprehension and information processing. However, traditional benchmarks, which are often static, fail to capture the continually changing information landscape, leading to a disparity between the perceived and actual effectiveness of LLMs in ever-changing real-world scenarios. Our study examines temporal generalization, which includes the ability to understand, predict, and generate text relevant to past, present, and future contexts, revealing significant temporal biases in LLMs. We propose an evaluation framework, for dynamically generating benchmarks from recent real-world predictions. Experiments demonstrate that LLMs struggle with temporal generalization, showing performance decline over time. These findings highlight the necessity for improved training and updating processes to enhance adaptability and reduce biases. Our code, dataset and benchmark are available at https://github.com/FreedomIntelligence/FreshBench.

llm outdated, temporal generalization

2405.0846

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Shen, Ethan, Singh, Scotty, Kumar, Bhavesh

Generative Visual Question Answering

arXiv.org Artificial IntelligenceJul-18-2023

Multi-modal tasks involving vision and language in deep learning continue to rise in popularity and are leading to the development of newer models that can generalize beyond the extent of their training data. The current models lack temporal generalization which enables models to adapt to changes in future data. This paper discusses a viable approach to creating an advanced Visual Question Answering (VQA) model which can produce successful results on temporal generalization. We propose a new data set, GenVQA, utilizing images and captions from the VQAv2 and MS-COCO dataset to generate new images through stable diffusion. This augmented dataset is then used to test a combination of seven baseline and cutting edge VQA models. Performance evaluation focuses on questions mirroring the original VQAv2 dataset, with the answers having been adjusted to the new images. This paper's purpose is to investigate the robustness of several successful VQA models to assess their performance on future data distributions. Model architectures are analyzed to identify common stylistic choices that improve generalization under temporal distribution shifts. This research highlights the importance of creating a large-scale future shifted dataset. This data can enhance the robustness of VQA models, allowing their future peers to have improved ability to adapt to temporal distribution shifts.

machine learning, natural language, question answering, (19 more...)

2307.10405

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Artificial IntelligenceJun-26-2023

Learn over Past, Evolve for Future: Forecasting Temporal Trends for Fake News Detection

Hu, Beizhe, Sheng, Qiang, Cao, Juan, Zhu, Yongchun, Wang, Danding, Wang, Zhengjia, Jin, Zhiwei

Fake news detection has been a critical task for maintaining the health of the online news ecosystem. However, very few existing works consider the temporal shift issue caused by the rapidly-evolving nature of news data in practice, resulting in significant performance degradation when training on past data and testing on future data. In this paper, we observe that the appearances of news events on the same topic may display discernible patterns over time, and posit that such patterns can assist in selecting training instances that could make the model adapt better to future data. Specifically, we design an effective framework FTT (Forecasting Temporal Trends), which could forecast the temporal distribution patterns of news data and then guide the detector to fast adapt to future distribution. Experiments on the real-world temporally split dataset demonstrate the superiority of our proposed framework. The code is available at https://github.com/ICTMCG/FTT-ACL23.

detection, machine learning, natural language, (19 more...)

2306.14728

Country:

Asia > China > Hebei Province (0.04)
Asia > China > Gansu Province > Lanzhou (0.04)

Genre: Research Report (0.50)

Industry:

Media > News (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.34)

arXiv.org Artificial IntelligenceFeb-3-2021

Pitfalls of Static Language Modelling

Lazaridou, Angeliki, Kuncoro, Adhiguna, Gribovskaya, Elena, Agrawal, Devang, Liska, Adam, Terzi, Tayfun, Gimenez, Mai, d'Autume, Cyprien de Masson, Ruder, Sebastian, Yogatama, Dani, Cao, Kris, Kocisky, Tomas, Young, Susannah, Blunsom, Phil

Our world is open-ended, non-stationary and constantly evolving; thus what we talk about and how we talk about it changes over time. This inherent dynamic nature of language comes in stark contrast to the current static language modelling paradigm, which constructs training and evaluation sets from overlapping time periods. Despite recent progress, we demonstrate that state-of-the-art Transformer models perform worse in the realistic setup of predicting future utterances from beyond their training period -- a consistent pattern across three datasets from two domains. We find that, while increasing model size alone -- a key driver behind recent progress -- does not provide a solution for the temporal generalization problem, having models that continually update their knowledge with new information can indeed slow down the degradation over time. Hence, given the compilation of ever-larger language modelling training datasets, combined with the growing list of language-model-based NLP applications that require up-to-date knowledge about the world, we argue that now is the right time to rethink our static language modelling evaluation protocol, and develop adaptive language models that can remain up-to-date with respect to our ever-changing and non-stationary world.

evaluation, language model, perplexity, (16 more...)

2102.01951

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)